{
    "name": "webpage_scraper",
    "description": "This tool is useful for extracting up-to-date information (text) from web pages, making it ideal for gathering data for analysis. If the user provides multiple URLs, process each one separately and then synthesize the extracted information into a single, comprehensive response. Make sure to add the HTTP protocol (https://) to website URLs if the user forgets to do so.\n\nImportant: The webpage_scraper function retrieves the raw text content of any webpage. It does not provide any structural information like headings, paragraphs, or specific elements.",
    "color": "linear-gradient(rgb(75,205,223), rgb(4,90,12))",
    "iconSrc": "https://raw.githubusercontent.com/FlowiseAI/Flowise/main/packages/components/nodes/documentloaders/Spider/spider.svg",
    "schema": "[{\"id\":0,\"property\":\"url\",\"description\":\"This is the URL provided by the user\",\"type\":\"string\",\"required\":true}]",
    "func": "const fetch = require('node-fetch');\nconst targetUrl = $url;\nconst data = {\n  \"depth\": 1,\n  \"limit\": 1,\n  \"proxy_enabled\": true,\n  \"anti_bot\": true,\n  \"request\": \"smart\",\n  \"return_format\": \"text\",\n  \"cache\": true,\n  \"store_data\":true,\n  \"url\": `${targetUrl}`\n};\n\nconst url = 'https://api.spider.cloud/crawl';\n\ntry {\n    const response = await fetch(url, {\n        method: 'POST',\n        headers: {\n            'Authorization': `Bearer SPIDER_API_KEY`,\n            'Content-Type': 'application/json'\n        },\n        body: JSON.stringify(data)\n    });\n    if (!response.ok) {\n        console.error('Network response was not ok:', response.statusText);\n        return `Error: ${response.statusText}`; \n    }\n    const text = await response.text(); \n    return text; \n} catch (error) {\n    console.error(error);\n    return ''; \n}\n\n/*\n * Works well with OpenAI models (gpt-4o and gpt-4o-mini). \n * Inconsistencies may occur with Google models (Gemini 1.5, 1.5 Flash).\n * Other models are untested.\n *\n * For Scraping:\n * depth (number): The maximum scrape depth (0 for no limit).\n * limit (number): The maximum number of pages to scrape per website.\n * proxy_enabled (boolean): Enables the use of premium proxies for scraping.\n * anti_bot (boolean): Enable anti-bot mode using techniques to increase the chance of success\n * request (string): The request type: 'http', 'chrome', or 'smart'.\n * return_format (string): The format for the returned data.\n * cache (boolean): Use HTTP caching for the crawl to speed up repeated runs.\n * store_data (boolean): To collect resources to download and re-use later on.\n * url (string): The URI of the resource to scrape.\n * \n * For more options:\n * https://spider.cloud/docs/api\n */\n"
}
